# Convolution-enhanced ViT
Cvt W24 384 22k
Apache-2.0
CvT-w24 is a vision transformer model pre-trained on ImageNet-22k and fine-tuned at 384x384 resolution, improving traditional vision transformers through convolutional enhancements.
Image Classification
Transformers

C
microsoft
66
0
Cvt 13
Apache-2.0
CvT-13 is a hybrid architecture model combining convolutional neural networks and vision transformers, pre-trained on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification
Transformers

C
microsoft
21.80k
11
Featured Recommended AI Models